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Abstract 

In a recent paper |^) it was shown that No Free Lunch results J5| hold 
for any subset F of the set of all possible functions from a finite set X to 
a finite set y iff F is closed under permutation of X. In this article, we 
prove that the number of those subsets can be neglected compared to the 
overall number of possible subsets. Further, we present some arguments 
why problem classes relevant in practice are not likely to be closed under 
permutation. 

1 Introduction 

The No Free Lunch (NFL) theorems — roughly speaking — state that all search 
algorithms have the same average performance over all possible objective func- 
tions / : X — ► y, where the search space X as well as the cost- value space y 
are finite sets ||. However, it has been argued that in practice one does not 
need an algorithm that performs well on all possible functions, but only on a 
subset that arises from the real-world problems at hand. Further, it has been 
shown that for pseudo-Boolean functions restrictions of the complexity lead to 
subsets of functions on which some algorithms perform better than others (e.g., 
in jl| complexity is defined in terms of the number of local minima and in the 
complexity is defined based on the size of the smallest OBDD representations 
of the functions). 

Recently, a sharpened version of the NFL theorem has been proven that 
states that NFL results hold for any subset F of the set of all possible functions 
if and only if F is closed under permutation ||. Based on this important result, 
we can derive classes of functions where NFL does not hold simply by showing 
that these classes are not closed under permutation (c.u.p.). This leads to the 
encouraging results in this paper: It is proven that the fraction of subsets c.u.p. 
is so small that it can be neglected. In addition, arguments are given why we 
think that objective functions resulting from important classes of real-world 
problems are likely not to be c.u.p. 
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In the following section, we give some basic definitions and concisely restate 
the sharpened NFL theorem given in || . Then we derive the number of subsets 
c.u.p. Finally, we discuss some observations regarding structured search spaces 
and closure under permutation. 

2 Preliminaries 

We consider a finite search space X and a finite set of cost values y. Let 
T = y x be the set of all objective functions / : X — > y to be optimized 
(also called fitness, energy, or cost functions). NFL theorems are concerned 
with non-repeating black-box search algorithms (referred to simply as algo- 
rithms for brevity) that choose a new exploration point in the search space 
depending on the complete history of prior explorations: Let the sequence 
T rn = ((x 1 ,f(x 1 )),(x 2 ,f(x 2 )) ) ...,(x m) f(x m ))) represent m non-repeating ex- 
plorations Xi G X, Vjj : Xi ^ Xj and their cost values f(x{) G y. An algorithm 
a appends a pair (x m +i, f(x m+ i)) to this sequence by mapping T m to a new 
point x m +i, Vj : x m +i ^ x,. Generally, the performance of an algorithm a after 
m iterations with respect to a function / depends on the sequence of cost val- 
ues Y(f, to, a) = (f(xi), f(x 2 ), ■ • ■ , f{x m )} the algorithm has produced. Let the 
function c denote a performance measure mapping sequences of y to the real 
numbers (e.g., in the case of function minimization a performance measure that 
returns the minimum y value in the sequence could be a reasonable choice). 

Let 7r : X — > X be a permutation (i.e., bijective function) of X. The set of 
all permutations of X is denoted by ViX). A set F C T is said to be closed 
under permutation (c.u.p.) if for any 7r 6 V(X) and any function / G F the 
function / o tt is also in F . 

Theorem 1 (NFL). For any two algorithms a and b, any value k G R, and 
any performance measure c 

S(k, c(Y(f, m, a))) = ]T 6(k, c(Y(f, m, b))) 
feF f&F 

iff F is c.u.p. 

Herein, S denotes the Kronecker function (6(i,j) = 1 if i = j, 6(i,j) = 
otherwise). A proof of theorem [l] is given in ||. This theorem implies that for 
any two algorithms a and b and any function f a G F, where F is c.u.p., there is 
a function /t, G F on which b has the same performance as a on f a . 

3 Fraction of Subsets Closed under Permutation 

Let T — y x be the set of functions mapping X — > y. There exist 2^'' ) — 1 
non-empty subsets of T . We want to calculate the fraction of subsets that are 
c.u.p. 
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Theorem 2. The number of non-empty subsets of y that are c.u.p. is given 
by 

(\x\+\y\-i\ 
2V 1*1 ) - 1 . 

The proof is given in the appendix. 

Figure |l| shows a plot of the fraction of non-empty subsets c.u.p., i.e., 

( 2 ™ | - 1 )-l)/( 2 (W W )-l) , 

versus the cardinality of X for different values of \y\. The fraction decreases for 
increasing \X\ as well as for increasing \y\. Already for small \X\ and |^| the 
fraction almost vanishes, e.g., for a Boolean function / : {0, l} 3 — > {0,1} the 
fraction is <C 10~ 170 . 



4 Search Spaces with Neighborhood Relations 

In the previous section, we have shown that the fraction of subsets c.u.p. is 
close to zero already for small search and cost-value spaces. Still, the absolute 
number of subsets c.u.p. grows rapidly with increasing \X\ and \y\. What if 
these classes of functions are the "important" ones, i.e., those we are dealing 
with in practice? In this section, we define some quite general constraints on 
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Figure 1: The ordinate gives the fraction of subsets closed under permutation 
on logarithmic scale given the cardinality of the search space X. The different 
curves correspond to different cardinalities of the codomain y. 
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functions important in practice that induce classes of functions that are not 
c.u.p. 

We believe that two assumptions can be made for most of the functions we 
are dealing with in real-world optimization: First, the search space has some 
structure. Second, the set of objective functions we are interested in fulfills 
some constraints defined based on this structure. More formally, there exists a 
non-trivial neighborhood relation on X based on which constraints on the set 
of functions under consideration are formulated. For example, with respect to a 
neighborhood relation we can define concepts like ruggedness or local optimality 
and constraints like upper bounds on the ruggedness or on the maximum number 
of local minima. Intuitively, it is likely that in a function class c.u.p. there exists 
a function that violates such constraints. 

We define a simple neighborhood relation on X as a symmetric function n : 
X x X — > {0, 1}. Two elements Xi, Xj 6 X are called neighbors iff n(xi,Xj) = 1. 
We call a neighborhood non-trivial iff 3xi, Xj e X : Xi ^ Xj A n(xi,Xj) = 1 and 
3xk,X[ G X : Xk ^ xi A n(xk,xi) = 0. It holds: 

Theorem 3. A non-trivial neighborhood on X is not invariant under permuta- 
tions of X. 

Proof. It holds 3xi,Xj,Xk,xi € X : Xi ^ Xj A Xk ^ xi A n(xi,xj) = A 
n(xk,xi) — 1. For any permutation it that maps xi and Xj onto Xk and xi, 
respectively, the invariance property, Va, b e X : n(x a ,Xb) — n(ir(x a ),TT(xb)), is 
violated. □ 



Remark 1. Assume the search space X can be decomposed as X = X\ x • • • x 
Xi,l > 1 and let on one component Xi exist a non-trivial neighborhood : 
Xi x Xi — > {0, 1}. This neighborhood induces a non-trivial neighborhood on X, 
where two points are neighbored iff their z-th components are neighbored with 
respect to n, L . Thus, the constraints discussed below need only refer to a single 
component. 

Remark 2. The neighborhood relation need not be the canonical one (e.g., Ham- 
ming-distance for Boolean search spaces). Instead, it can be based on "pheno- 
typic" properties (e.g., if integers are encoded by bit-strings, then the bit-strings 
can be defined as neighbored iff the corresponding integers are). 

Now we describe some constrains that are defined with respect to a neigh- 
borhood relation and are — to our minds — relevant in practice. For this purpose, 
we assume a metric dy : y x y — > R on y, e.g., in the typical case of real- valued 
fitness function y C 1R the Euclidean distance. 

First, we show how a constraint on steepness (closely related to the concept 
of strong causality) leads to a set of functions that is not c.u.p. Based on a 
neighborhood relation on the search space, we can define a simple measure of 
maximum steepness of a function / € T by 

s maX (/)= max dy(f( Xi ),f( Xj )) . 

Xi.Xj^X A n(xi,Xj) — l 
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Further, for a function / £ F, we define the diameter of its range as 

d max (/)= max dy (f {xl) J( Xj) ) . 

Corollary 1. If the maximum steepness s max (/) of every function f in a non- 
empty subset F C T is constrained to be smaller than the maximal possible 
maxjgj? d max (/) ; then F is not c.u.p. 

Proof. Let g = argmax/ 6 ^ <i max (/) and let Xi and Xj be two points with prop- 
erty d(g(xi), g(xj)) — d max (g). Since the neighborhood on X is non-trivial there 
exist two neighboring points Xk and x\ . There exists a permutation tt that maps 
Xi and Xj on Xk and a;;. If F is c.u.p., the function g o 7r is in F. This function 
has steepness s max (g o 7r) = <i max (<7) = maxj g ^ d max (/), which contradicts the 
steepness-constraint. □ 

As a second constraint, we consider the number of local minima, which is 
often regarded as a measure of complexity 0. For a function / £ J 7 a point 
a; £ A" is a local minimum iff /(x) < f{xi) for all neighbors Xi of Given a 
function / and a neighborhood relation on X, we define ^ max (/) as the maximal 
number of minima that functions with the same ^-histogram as / can have (i.e., 
functions where the number of X- values that are mapped to a certain y~ value 
are the same as for /, see appendix). In the appendix we prove that for any two 
functions /, g with the same ^-histogram there exists a permutation n £ V{X) 
with / o 7r = g. Thus, it follows: 

Corollary 2. If the number of local minima of every function f in a non- 
empty subset F C T is constrained to be smaller than the maximal possible 
max/gi? l ma *(f), then F is not c.u.p. 

For example, consider pseudo-Boolean function {0, 1}" — > R and let two points 
be neighbored iff they have Hamming-distance one. Then the maximum number 
of local minima is 2™~ 1 . 



5 Conclusion 

Based on the results in Q, we have shown that the statement "I'm only inter- 
ested in a subset F of all possible functions, so the NFL theorems do not apply" 
is true with a probability close to one (if F is chosen uniformly and y and X have 
reasonable cardinalities). Further, the statements "In my application domain, 
functions with maximum number of local minima are not realistic" and "For 
some components, the objective functions under consideration will not have the 
maximal possible steepness" lead to scenarios where NFL does not hold. 
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A Proof of Theorem g 



For the proof, we use the concepts of 3^-histograms: We define a y -histogram 
{histogram for short) as a mapping h : y — > No such that ^2 y ^y h(y) = \X\. 
The set of all histograms is denoted TC. With any function / : X — > we 
associate the histogram /i(y) = \f~ 1 {y)\ that counts the number of elements in 
X that are mapped to the same value y £ y by /. Herein, y £ y returns 

the preimage {x\f(x) = y} of /. Further, we call two functions /, g h-equivalent 
iff they have the same histogram and we call the corresponding ^.-equivalence 
class Bh C T containing all function with histogram h a basis class. Before 
we prove theorem U, we consider the following lemma that gives some basic 
properties of basis classes. 

Lemma 1. (a) There exist 

f\x\ + \y\-i\ 
V 1*1 J 

pairwise disjoint basis classes and 

|J B h = T . 

hen 

(b) Two functions f,g £ J- are h-equivalent iff there exists a permutation tt 
of X such that f o tt = g. 

(c) Bh is equal to the permutation orbit of any function f with histogram h, 
i.e., 

B h = IJ {/ ° ^ ■ 

(d) Any subset F C T that is c.u.p. is uniquely defined by a union of pairwise 
disjoint basis classes. 

Proof. (a) The number \7i\ of different histograms is given by 

(\x\ + \y\-i\ 
\ 1*1 ) ' 

i.e., the number of distinguishable distributions (e.g., p. 38). Two basis 
classes Bf ll and Bh 2 , hi ^ hi, are disjoint because functions in different 
basis classes have different histograms. The union lj ftgw Bh = T because 
every function in T has a histogram. 

(b) Let /, g £ A" be two functions with same histogram h. Then, for any 
y £ y, f~ 1 {y) and g~ 1 {y) are equal in size and there exists a bijective 
function tt v between these two subsets. Then the bijection 

7t(x) = ir y {x) , where y = f(x) , 
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defines a unique permutation such that / o 7r = g. Thus, /i-equivalence 
implies existence of a permutation. On the other hand, the histogram of a 
function is invariant under permutation since for any y S y and tt S V(X) 

\(f°*r\y)\ = E ^./w*))) = E %/(*)) = . 

because 7r is bijective and the addends can be resorted. Thus, existence 
of a permutation implies /i-equivalence. 

(c) For a function / with histogram h, let Of = U 7r ep(A'){/ ° n } ^ e ^ ne 
orbit of / under permutations tt. By (|b|), all functions in Of have the 
same histogram and thus Of C _B^. On the other hand, for any functions 
g G Bh there exists by ([]) a permutation 7r such that / o 7r = g and thus 
5 h C Of. 

(d) For a subset FCf, let ^ = BhC\ F (i.e., F/j contains all functions in 
F with the same histogram h). By (Q), all are pairwise disjoint and 
F = {J heH Fh- Suppose Fh ^ 0: Since F is c.u.p. there exists a function 
/ G Fh that spans the orbit Bh- Thus i??,, C F and therefore F/j = Bh. 
Because basis classes are disjoint, the union 

F= |J B h 

h: hen A F h =£<» 

is unique. 

□ 

Proof of theorem [|. By lemma the number of different basis classes is given 

by 

x\ + \y\-i 

\x\ 

The number of different, non-empty unions of basis classes (equal to the cardi- 
nality of power set of the set of all basis classes minus one for the empty set) is 
given by 

r\x\+\y\-i\ 
2^ l^l > - 1 . 

By lemma 0(0), this is the number of non-empty subsets of T that are c.u.p. □ 
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